341 research outputs found

    Scale-Adaptive Neural Dense Features: Learning via Hierarchical Context Aggregation

    Get PDF
    How do computers and intelligent agents view the world around them? Feature extraction and representation constitutes one the basic building blocks towards answering this question. Traditionally, this has been done with carefully engineered hand-crafted techniques such as HOG, SIFT or ORB. However, there is no ``one size fits all'' approach that satisfies all requirements. In recent years, the rising popularity of deep learning has resulted in a myriad of end-to-end solutions to many computer vision problems. These approaches, while successful, tend to lack scalability and can't easily exploit information learned by other systems. Instead, we propose SAND features, a dedicated deep learning solution to feature extraction capable of providing hierarchical context information. This is achieved by employing sparse relative labels indicating relationships of similarity/dissimilarity between image locations. The nature of these labels results in an almost infinite set of dissimilar examples to choose from. We demonstrate how the selection of negative examples during training can be used to modify the feature space and vary it's properties. To demonstrate the generality of this approach, we apply the proposed features to a multitude of tasks, each requiring different properties. This includes disparity estimation, semantic segmentation, self-localisation and SLAM. In all cases, we show how incorporating SAND features results in better or comparable results to the baseline, whilst requiring little to no additional training. Code can be found at: https://github.com/jspenmar/SAND_featuresComment: CVPR201

    Genetic structure and colonisation history of European and UK population of Gammarus pulex

    Get PDF
    The structure of populations has been studied for many years and there have been three main factors that have been suggested as the cause for present-day distributions of species, those being environment, biology and history. With the use of molecular data and advanced phylogeographic approaches it is now possible to distinguish between the main causes of population structuring. The present study considers the extent of population structure in G. pulex on regional (UK) and large geographic (Europe) scales using studies of molecular genetic (allozymes, mtDNA sequencing and microsatellites) and morphological variation.Molecular analysis of Gammarus pulex in Europe revealed more diversity than previously thought. This was thought to be a consequence of two separate waves of colonisation after the formation of the major drainages in the Miocene. The UK appears to have been colonised once from either the Elbe, Mosel and Rhine drainages separately or cumulatively across the drainage basins late in the Pleistocene before a land bridge connection to mainland Europe was submerged. Limited molecular variation in the UK is thought to be a result of reduced genetic variation in the colonising individuals. This in turn was caused by repeated founder events during population expansion and contraction from European refugia.A detailed analysis of a transplantation experiment in 1950 in the Isle of Man revealed little genetic impoverishment of the introduced population when compared to the source. In contrast, morphological variation increased in the introduced population. Unlike in mainland Europe there was no historical explanation for the diversity recorded (as the introduced population was so young) and, in the absence of fragmentation, speciation and colonisation the contemporary forces of gene flow, selection and limited genetic drift are thought to be the determining factors in population structure

    CERiL: Continuous Event-based Reinforcement Learning

    Full text link
    This paper explores the potential of event cameras to enable continuous time reinforcement learning. We formalise this problem where a continuous stream of unsynchronised observations is used to produce a corresponding stream of output actions for the environment. This lack of synchronisation enables greatly enhanced reactivity. We present a method to train on event streams derived from standard RL environments, thereby solving the proposed continuous time RL problem. The CERiL algorithm uses specialised network layers which operate directly on an event stream, rather than aggregating events into quantised image frames. We show the advantages of event streams over less-frequent RGB images. The proposed system outperforms networks typically used in RL, even succeeding at tasks which cannot be solved traditionally. We also demonstrate the value of our CERiL approach over a standard SNN baseline using event streams.Comment: 9 pages, 10 figure

    Generalizing to New Tasks via One-Shot Compositional Subgoals

    Full text link
    The ability to generalize to previously unseen tasks with little to no supervision is a key challenge in modern machine learning research. It is also a cornerstone of a future "General AI". Any artificially intelligent agent deployed in a real world application, must adapt on the fly to unknown environments. Researchers often rely on reinforcement and imitation learning to provide online adaptation to new tasks, through trial and error learning. However, this can be challenging for complex tasks which require many timesteps or large numbers of subtasks to complete. These "long horizon" tasks suffer from sample inefficiency and can require extremely long training times before the agent can learn to perform the necessary longterm planning. In this work, we introduce CASE which attempts to address these issues by training an Imitation Learning agent using adaptive "near future" subgoals. These subgoals are recalculated at each step using compositional arithmetic in a learned latent representation space. In addition to improving learning efficiency for standard long-term tasks, this approach also makes it possible to perform one-shot generalization to previously unseen tasks, given only a single reference trajectory for the task in a different environment. Our experiments show that the proposed approach consistently outperforms the previous state-of-the-art compositional Imitation Learning approach by 30%.Comment: Present at ICRA 2022 "Compositional Robotics: Mathematics and Tools

    DeFeat-Net: General Monocular Depth via Simultaneous Unsupervised Representation Learning

    Full text link
    In the current monocular depth research, the dominant approach is to employ unsupervised training on large datasets, driven by warped photometric consistency. Such approaches lack robustness and are unable to generalize to challenging domains such as nighttime scenes or adverse weather conditions where assumptions about photometric consistency break down. We propose DeFeat-Net (Depth & Feature network), an approach to simultaneously learn a cross-domain dense feature representation, alongside a robust depth-estimation framework based on warped feature consistency. The resulting feature representation is learned in an unsupervised manner with no explicit ground-truth correspondences required. We show that within a single domain, our technique is comparable to both the current state of the art in monocular depth estimation and supervised feature representation learning. However, by simultaneously learning features, depth and motion, our technique is able to generalize to challenging domains, allowing DeFeat-Net to outperform the current state-of-the-art with around 10% reduction in all error measures on more challenging sequences such as nighttime driving

    Same Features, Different Day: Weakly Supervised Feature Learning for Seasonal Invariance

    Get PDF
    "Like night and day" is a commonly used expression to imply that two things are completely different. Unfortunately, this tends to be the case for current visual feature representations of the same scene across varying seasons or times of day. The aim of this paper is to provide a dense feature representation that can be used to perform localization, sparse matching or image retrieval, regardless of the current seasonal or temporal appearance. Recently, there have been several proposed methodologies for deep learning dense feature representations. These methods make use of ground truth pixel-wise correspondences between pairs of images and focus on the spatial properties of the features. As such, they don't address temporal or seasonal variation. Furthermore, obtaining the required pixel-wise correspondence data to train in cross-seasonal environments is highly complex in most scenarios. We propose Deja-Vu, a weakly supervised approach to learning season invariant features that does not require pixel-wise ground truth data. The proposed system only requires coarse labels indicating if two images correspond to the same location or not. From these labels, the network is trained to produce "similar" dense feature maps for corresponding locations despite environmental changes. Code will be made available at: https://github.com/jspenmar/DejaVu_Feature

    Westerlund 1 as a Template for Massive Star Evolution

    Full text link
    With a dynamical mass M_dyn ~ 1.3x10e5 M_sun and a lower limit M_cl>5x10e4 M_sun from star counts, Westerlund 1 is the most massive young open cluster known in the Galaxy and thus the perfect laboratory to study massive star evolution. We have developed a comprehensive spectral classification scheme for supergiants based on features in the 6000-9000A range, which allows us to identify >30 very luminous supergiants in Westerlund 1 and ~100 other less evolved massive stars, which join the large population of Wolf-Rayet stars already known. Though detailed studies of these stars are still pending, preliminary rough estimates suggest that the stars we see are evolving to the red part of the HR diagram at approximately constant luminosity.Comment: To be published in Proceedings of IAU Symposium 250: Massive Stars as Cosmic Engines, held in Kaua'i (Hawaii, USA), Dec 2007, edited by F. Bresolin, P.A. Crowther & J. Puls (Cambridge University Press

    Sign Language Transformers: Joint End-to-end Sign Language Recognition and Translation

    Get PDF
    Prior work on Sign Language Translation has shown that having a mid-level sign gloss representation (effectively recognizing the individual signs) improves the translation performance drastically. In fact, the current state-of-the-art in translation requires gloss level tokenization in order to work. We introduce a novel transformer based architecture that jointly learns Continuous Sign Language Recognition and Translation while being trainable in an end-to-end manner. This is achieved by using a Connectionist Temporal Classification (CTC) loss to bind the recognition and translation problems into a single unified architecture. This joint approach does not require any ground-truth timing information, simultaneously solving two co-dependant sequence-to-sequence learning problems and leads to significant performance gains. We evaluate the recognition and translation performances of our approaches on the challenging RWTH-PHOENIX-Weather-2014T (PHOENIX14T) dataset. We report state-of-the-art sign language recognition and translation results achieved by our Sign Language Transformers. Our translation networks outperform both sign video to spoken language and gloss to spoken language translation models, in some cases more than doubling the performance (9.58 vs. 21.80 BLEU-4 Score). We also share new baseline translation results using transformer networks for several other text-to-text sign language translation tasks
    • …
    corecore